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Chapter  3 
STOCHASTIC  MODELS  OF  CONSUMER  RESPONSE 

In  general,  stochastic  models  of  consumer  behavior  have  been  confined  to 
product  markets  which  are  composed  of  frequently  purchased  consumer  items.   The 
reason  for  this  tendency  to  restrict  the  models  to  a  subset  of  total  consumer 
behavior  becomes  clear  when  we  note  that  stochastic  models  generally  require  data 
on  a  sequence  of  several  purchases  for  each  household  or  individual  included 
in  the  analysis.   Consequently,  this  discussion  will  implicitly  assume  that  we 
are  concerned  with  frequently  purchased  (generally  branded)  consumer  goods. 

Given  this  restriction,  what  behavior  or  responses  would  we  ideally  like 
to  account  for  in  our  models?   In  the  most  general  case  we  would  like  our 
models  to  describe  or  account  for: 

1.  Brand  choice 

2.  Interpurchase  timing 

3.  Quantity  purchased 

4.  Store  of  purchase 

5.  The  impact  of  marketing  variables  such  as  price,  deals,  advertising, 
in-store  displays,  etc. 

To  date,  greatest  attention  has  focused  upon  item  1,  although  some  progress  has 

been  made  in  dealing  with  the  other  factors. 

Stochastic  models  allow  for  the  multitude  of  factors  which  affect  consumer 

behavior  by  means  of  response  uncertainty.    Thus  the  problem  of  describing  and 

predicting  consumer  behavior  in  a  stochastic  model  is  reduced  to  the  problem  of 

specifying  a  probability  law  for  the  behavior  of  interest.   Note  that  the 


See  Coleman  [1964b].   This  discussion  draws  upon  Massy  [1966]. 
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specification  of  the  probability  law  itself  may  contain  important  factors  from 
the  behavioral  situation.   For  example,  linear  learning  models  incorporate  the 
assumption  that  the  entire  history  of  brands  purchased  conditions  the  brand 
choice  probability  on  any  given  purchase  occasion.   Consumer  behavior  may  often 
be  described  by  relatively  simple  stochastic  models,  while  exceedingly  complex 
deterministic  models  would  be  required. 

In  the  remainder  of  this  section  we  shall  give  further  consideration  to 
desirable  characteristics  of  stochastic  models,  to  problems  in  modeling  consumer 
behavior,  some  examples  of  stochastic  model  applications,  and  considerations 
in  future  research. 
1.    Desirable  Properties  of  Stochastic  Models 

In  stochastic  models  of  consumer  behavior  we  generally  have  some  set  (or 
a  continuum)  of  alternative  responses.   For  example,  a  set  of  discrete  response 
alternatives  might  be  the  individual  brands  available  for  the  product  of 
interest.   An  example  of  a  continuous  response  would  be  the  time  between 
successive  purchases.   In  any  case,  in  stochastic  models  we  generally  have  one 
or  more  response  variables  with  two  or  more  levels  each.   The  actual  outcome  or 
response  at  any  given  response  occasion  (e.g.,  purchase  of  the  product)  is  the 
result  of  a  probability  process.   We  shall  term  the  probability  of  any  particular 
outcome  the  response  probability  of  that  outcome. 

The  properties  we  would  like  our  stochastic  models  to  have  may  be 
meaningfully  segmented  into  two  groups:   model  properties  and  statistical 
properties.   These  properties  will  first  be  outlined  and  then  discussed  below: 

Model  Properties 

1.  Heterogeneity  of  consumers 

2.  Non-stationarity  of  response  probabilities 

3.  Measures  of  interest 
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4.  Ability  to  model  multi-alternative  markets 

5.  Ability  to  handle  inter-response  time  problems 
Statistical  Properties 

1.  Data  based 

2.  A  test  of  goodness  of  fit 

3.  Computationally  feasible  parameter  estimates  having  known  properties 
A.  Empirically  viable  for  at  least  one  set  of  market  data 

At  the  present  time,  no  model  has  achieved  all  of  these  properties.   Neverthe- 
less, recent  work  in  stochastic  models  has  greatly  generalized  our  previous 
ability  to  model  consumer  behavior  and,  hopefully,  even  more  rapid  advances 
will  be  forthcoming  in  the  future. 

Heterogeneity.   At  any  given  response  occasion,  consumers  will  tend  to 
differ  from  one  another  in  terms  of  their  response  probabilities.   We  would 
like  to  have  our  models  account  for  this  distribution  of  response  probability 
in  the  consumer  population.   In  Section  2  we  discuss  certain  errors  of 
inference  which  occur  when  we  treat  a  heterogeneous  population  of  consumers 
as  homogeneous  with  respect  to  their  response  probabilities. 

Non-stationarity .   A  response  probability  is  said  to  be  stationary  if  it 
does  not  change  from  response  occasion  to  response  occasion.   In  some  situations 
this  may  be  reasonable  in  the  very  short  run.   However,  in  most  markets  and  in 
virtually  all  markets  in  the  long  run,  consumers  are  likely  to  change  their 
response  probabilities.   These  changes  may  occur  as  the  result  of  marketing 
activities  (i.e.,  price,  deals,  promotion),  changes  in  family  circumstances 
(i.e.,  children  leaving  home,  the  family  moving  to  a  new  community),  or  product 
experience.   In  any  case,  the  likelihood  that  a  consumer's  response  probability 
will  change  is  high  and  thus  we  would  like  to  have  our  models  at  least  have 
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the  ability  to  detect  these  changes.   Non-stationarity  may  also  cause  errors  of 
inference  if  we  have  assumed  stationarity .   This  topic  is  pursued  in  section  2. 

Measur&sof  Interest.   Models  generally  yield  market  measures  of  interest  to 
the  market  researcher  or  the  marketing  manager.   For  example,  Markov  models 
may  be  used  to  predict  expected  brand  shares  at  equilibrium  (or  the  steady- 
state).   In  any  case,  we  would  like  to  have  our  models  yield  interesting  summary 
measures  of  market  characteristics  and  dynamics.   It  might  be  pointed  out  that 
such  measures  may  not  be  of  primary  interest  in  some  cases.   For  example,  our 
primary  interest  may  center  upon  the  structural  question  of  whether  consumers 
exhibit  linear  learning  behavior  in  a  brand  market  or  some  form  of  non- 
stationary,  heterogeneous  Bernoulli  behavior.   In  this  case  the  structural 
hypothesis  is  the  focus  of  interest  rather  than  the  measures  available  from 
these  alternative  models. 

Multi-Alternative  Markets.   Markets  for  frequently  purchased  consumer 
items  are  multi-brand  markets.   Whenever  possible,  we  would  like  to  be  able  to 
model  consumer  behavior  toward  several  of  the  brands  in  a  market.   In  most  of 
these  markets,  however,  there  are  over  one  hundred  different  brands  ranging 
from  national  and  regional  brands  down  to  the  private  labels  of  the  chain 
stores.   Although  a  model  may  be  able  to  encompass  a  very  large  number  of  brands 
conceptually,  data  requirements  impose  a  constraint  on  the  number  of  brands  a 
model  may  encompass.   The  brands  which  aren't  explicitly  modeled  are  generally 
lumped  into  an  "all  other"  category.   The  problems  involved  in  this  procedure  are 
discussed  in  Section  2  under  the  combining  of  classes  question. 

Inter-Response  Time  Problems.   Unfortunately,  consumers  do  not  cooperate 
with  our  model  building  attempts  by  purchasing  on  a  deterministic  cycle  —  i.e., 
weekly  or  monthly.   For  models  which  consider  the  sequence  of  outcomes  on 
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successive  response  occasions  without  explicitly  considering  inter-response 
time,  this  creates  certain  problems.   In  particular,  estimates  of  total  market 
size  or  expected  market  share  become  hazardous.   The  semi-Markov  model 

explicitly 
considersthese  questions.   In  other  models  an  ad  hoc  procedure  is  often  used  to 
overcome  these  problems  partially.   Once  again,  if  our  interest  centers  upon 
the  structure  of  behavior  related  to  a  sequence  of  responses,  this  problem  will 
not  be  of  major  importance. 

Data  Based.   If  our  models  are  to  be  useful  and  empirically  verifiable, 
they  should  be  data  based.   That  is,  the  models  should  relate  to  data  which 
either  are  available  or  else  may  reasonably  be  obtained. 

Goodness  of  Fit.   Before  we  may  interpret  a  set  of  data  in  terms  of  one  of 
our  models,  we  must  first  devise  some  means  whereby  we  may  test  the  "goodness 
of  fit"  or  descriptive  adequacy  of  the  model.   This  requirement  of  sound 
model  building  procedure  has  been  notable  in  its  breech  in  marketing  as  well 

as  in  the  behavioral  sciences.   See  the  papers  by  Montgomery  and  Morrison  for 

2 
a  discussion  of  the  chi  square  "goodness  of  fit'  test. 

Parameter  Estimates.   Stochastic  models  of  consumer  behavior  generally 

depend  upon  one  or  more  parameters  which  must  be  estimated  from  data  in  order 

to  identify  the  model.   Since  an  infinity  of  functions  of  the  data  may  be 

proposed  as  an  estimate  of  any  given  parameter,  we  should  choose  that  estimator 

which  has  desirable  statistical  properties.   See  Appendix  [     ]  for  a  discussion 

3 
of  statistical  properties  of  estimators^    Computational   feasibility  of  the 

estimator  is  another  important  consideration.   Numeric  procedures  are  often 


2 
See  Massy,  Montgomery,  and  Morrison  [1967,  Chapter  2 J  for  a  more  complete 

discussion. 

3 
Ibid. ,  Section  2.  2 


Page  6. 

available  when  analytic  techniques  break  down.   See  Appendix  [    ]  for  a 
discussion  of  numeric  procedures. 

Empiric  Viability.   To  remain  a  candidate  model  of  consumer  behavior,  it 
seems  reasonable  to  require  that  a  model  be  demonstrated  to  be  empirically 
viable  for  at  least  one  set  of  market  data.   That  is,  the  model  should  be 
consistent  with  at  least  one  set  of  data. 
2.    Problems  in  Modeling  Consumer  Behavior 

Several  factors  complicate  the  life  of  the  management  scientist  seeking 
to  build  and  test  stochastic  models  of  consumer  behavior.   They  are: 

1.  There  may  be  a  many  to  one  mapping  of  models  into  a  set  of  data 

2.  There  may  be  a  confounding  of  the  effects  of  heterogeneity  and  non- 
stationarity  of  response  probability 

3.  The  stochastic  process  generating  the  response  probabilities  may 
itself  undergo  change 

4.  The  combining  of  classes  problem  which  arises  when  an  N  alternative 
market  is  collapsed  into  a  two  alternative  market 

These  factors  are  discussed  below. 

The  first  factor,  a  many  to  one  mapping  of  models  into  a  set  of  data, 

reflects  the  fact  that  several  alternative  models  may,  in  fact,  be  consistent 

with  the  data.   That  is,  several  structurally  different  models  may  prove  to  be 

plausible  descriptive  models  of  the  actual  outcomes  we  have  observed. 

Consequently,  there  is  a  need  to  develop  methods  which  discriminate  among 

competing  models.   For  some  work  along  these  lines  see  the  paper  by  Morrison. 

A  more  extensive  discussion  is  available  in  Massy,  Montgomery,  and  Morrison 

[1967,  Chapter  2].   In  any  case,  the  fact  that  several  models  may  be  consistent 
with  a  set  of  data  should  temper  our  enthusiasm  when  we  find  a  particular  model 

to  be  an  excellent  fit  to  the  data. 
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The  second  factor  considerably  complicates  the  life  of  the  management 
scientist.   The  problems  which  can  arise  are  illustrated  below.   Many  models 
assume  that  the  population  of  consumers  is  homogeneous,  at  least  with  respect 
to  their  probability  of  making  any  one  of  the  alternative  responses.   This 
assumption  of  homogeneity  may  lead  us  to  conclude  that  the  process  which 
generates  the  responses  is  of  relatively  high  order  —  i.e.,  there  is  considerable 
dependence  between  responses  —  when  in  fact  the  process  is  of  low  (even  zero) 
order.   A  numerical  example  will  illustrate  the  difficulty.   Suppose  we  have  a 
population  of  consumers  consisting  of  two  types.   Type  I  consumers  comprise 
forty  percent  of  the  population  and  each  have  a  probability  of  0.8  of 
purchasing  Brand  A  rather  than  Brand  B  on  any  given  purchase  occasion.   Type 
II  consumers  comprise  sixty  percent  of  the  population  and  each  have  a  proba- 
bility of  0.3  of  purchasing  Brand  A,   This  amounts  to  assuming  that  in  both 
populations  consumers  choose  brands  according  to  a  Bernoulli  process.   Some 
useful  notation  is  summarized  below. 
Notation 

P(I)  =  0.4,  the  probability  that  a  consumer  choosen  at  random  from  the 
population  will  be  a  Type  I. 

P(II)  =  0.6,  the  probability  that  a  consumer  is  Type  II. 

P(A  1 1)  =  0.8,  the  probability  that  a  Type  I  consumer  buys  Brand  A  at  time  t„ 

P(A  I II)  =  0.3,  the  probability  that  a  Type  II  consumer  buys  Brand  A  at  time  t, 

P(A  |a  ^),  the  probability  that  a  consumer  who  has  purchased  Brand  A  at 
time  t-1  will  purchase  Brand  A  at  time  t. 

Suppose  we  now  draw  a  consumer  at  random  from  the  total  population.   What 

is  the  probability  that  he  will  purchase  Brand  A  on  purchase  occasion  t?   It  is 

just 
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(3-1)     P(A^)  =  P(A^|l)P(I)  +  P(aJiI)P(II) 

=  (0.8)(0.4)  +  (0.3)(0.6) 

=  0.50 

Remember  that  we  don't  know  whether  he  is  really  a  Type  I  or  a  Type  II  consumer. 

Now  suppose  that  we  observe  this  consumer  over  two  successive  purchases,  say  at 

times  zero  and  one.   What  is  the  probability  that  he  will  purchase  Brand  A  at 

time  one  given  that  he  purchased  it  at  time  zero?   It  is 

(3-2)     P(A^|Aq)  =  P(A^|i)P(I|Aq)  +  P(A^|i1)P(II|Aq) 

where  P(i|a  )  is  the  probability  that  a  consumer  who  purchased  Brand  A  at  time 

t  is  a  Type  I  consumer.   Similarly,  we  have  P(II|a  ),   Since  we  already  know 

P(A  |l)  and  P(A  |ll),  we  may  compute-  P(A^|a  )  if  we  can  determine  P(IJAq)  and 

P(II|a„).   By  Bayes  theorem  we  have 

P(I|Aq)  =  P(Aq|i)P(I)  =  P(AqIi)P(I) 

P(Aq|I)P(I)  +  P(Aq|II)P(II)    P(Aq) 

=  (0.8)(0.A) 
0.5 

=  0.64 

and 


P(II|Aq)  =  P(Aq|II)(P(II) 


P(Aq) 

=  (0.3)  (0.6) 
0.5 

=  0.36 

Thus  we  have  for  (3-2)  that 

(3-3)    P(A^|Aq)  =  (0.8) (0.64)  +  (0.3) (0.36) 

=  0.62 

From  (3-1)  and  (3-3)  we  see  that  a  purchase  of  brand  A  at  time  zero  has  seemingly 

increased  the  probability  of  a  purchase  of  Brand  A  at  time  1.   Can  we  infer  from 
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this  that  the  consumer  has  learned  to  purchase  brand  A  as  a  result  of  having 
purchased  it  once?   The  answer,  of  course,  is  no.   The  actual  probability  of  the 
consumer  purchasing  Brand  A  at  time  one  is  independent  of  his  purchase  at  time 
zero  by  the  assumption  upon  which  we  based  our  probability  calculations. 

How  then  do  we  account  for  this  apparent  effect  of  the  purchase  at  time 
zero  upon  the  probability  of  purchasing  Brand  A  at  time  one?   The  fact  that 
the  consumer  (chosen  at  random)  purchased  Brand  A  at  time  zero  makes  it  more 
likely  that  he  is  a  Type  I  consumer.   Since  Type  I  consumers  have  a  high 
probability  of  purchasing  Brand  A  and  since  the  consumer  we  have  chosen  is 
rather  likely  to  be  a  Type  I  consumer,  we  would  expect  the  probability  of 
purchasing  Brand  A  at  time  one  would  be  greater  than  the  average  probability 
in  the  entire  population  of  consumers.   Thus  heterogeneity  in  the  population 
has  created  this  apparent  dependence  effect. 

As  a  further  illustration  we  might  also  compute: 
(3-4)     P(A^|Bq)  =  P(A^|i)P(i|Bq)  +  P(A^|II)P(II|Aq) 

=  0.38 
Using  (3-3)  and  (3-4),  a  management  scientist  assuming  that  he  was  dealing  with 
a  first  order  Markov  chain  model  from  a  homogeneous  population  would  compute 
the  following  aggregate  transition  matrix 

Purchase  at  t+1 


Purchase  at  t  A 

B 


A        B  _ 
0.62     0.38 
0.38     0.62 


He  would  erroneously  conclude  that  a  purchase  of  brand  A  at  time  t  enhances  the 
probability  that  the  brand  will  be  purchased  at  time  t+1.   We  see  from  this  that 
such  an  aggregate  transition  matrix  can  only  exhibit  the  combined  effect  of 
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heterogeneity  and  past  purchases. 

If  the  probability  mechanism  which  generates  responses  changes  over  time, 
we  may  once  again  erroneously  reject  a  true  Bernoulli  process.   For  example, 
suppose  we  take  a  Type  II  consumer  from  our  previous  discussion.   We  have 
assumed  that  he  purchases  Brand  A  with  probability  0.3.   Now  suppose  that  we 
observe  his  brand  of  purchase  over  a  sequence  of  thirty  trials  and  that  on  the 
sixteenth  trial  (half  way  through  the  sequence)  Brand  A  becomes  his  favorite. 
That  is,  he  becomes  a  Type  I  consumer  with  probability  0.8  of  purchasing  Brand  A 
on  any  given  purchase  occasion.   Now  suppose  we  apply  one  of  the  standard  tests 
of  the  Bernoulli  hypothesis  such  as  the  Wuld-Wolfowitz  run  test.   This  test 
assumes  stationarity  and  will  generally  be  misleading  when  applied  to  a  non- 
stationary  process.   In  the  present  case  we  would  expect  our  estimate  of  the 
Bernoulli  parameter  p  to  be  (0.8  +  0.3)/2  =  0.55  for  the  entire  series  of 
thirty  purchases.   Since  the  true  p  was  0.3  for  the  first  fifteen  trials  and 
0.8  for  the  second  fifteen,  the  series  of  thirty  responses  would  be  expected 
to  "fail"  this  run  test.   That  is,  the  Bernoulli  hypothesis  would  be  rejected. 
Thus  the  one  shot  change  in  the  Bernoulli  parameter  may  lead  us  to  erroneously 
reject  the  Bernoulli  hypothesis,  which  in  this  case  we  have  assumed  to  be  true. 
Thus  there  is  a  need  for  methods  which  are  able  to  account  for  heterogeneity 
and  which  assume  no  more  than  short  run  stationarity. 

The  third  factor,  non-stationarity  of  the  process  generating  the  sequence 
of  response  probabilities,  arises  in  models  which  allow  the  response  probability 
itself  to  change.   In  this  case  we  are  dealing  with  changes  in  the  process  which 


For  an  empirical  example,  see  the  discussion  of  Massy's  [1966]  paper  in  Section 
3.1. 

^See  Siegel  (1956,  pp.  136-145). 
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generates  changes  in  the  response  probabilities.   Changes  in  this  higher  order 
process  will  generally  occur  more  slowly  than  changes  in  response  probability. 
But  the  possibility  of  these  changes  dictates  that  we  once  again  strive  to 
develop  methods  which  assume  only  short  run  stationarity  —  even  for  models 
which  allow  the  response  probability  to  change. 

The  fourth  factor,  the  comhhing  of  classes  problems,  may  arise  when  an 
N-altemative  market  is  collapsed  into  a  two  alternative  market.    For  example, 
many  models  consider  only  two  brands,  a  brand  of  particular  interest  and  an 
aggregate  of  all  other  brands.   If  this  combining  of  all  other  brands  into  an 
aggregate  brand  is  to  leave  the  structure  of  the  system  unchanged,  then  a 
stochastic  operator  on  the  state  space  of  the  system  must  be  of  a  special  form. 
This  is  of  some  importance  in  marketing  models  since  both  Markov  models  and 
linear  learning  models  involve  stochastic  operators.   For  example,  a  Markov 
transition  matrix  is  a  stochastic  operator.   The  reader  is  referred  to  the 
footnote  references  for  a  detailed  discussion  of  the  problem.   Suffice  it  here 
to  say  that  the  combining  of  classes  problem  would  seem  to  be  of  second  order 
importance  when  compared  to  the  first  three  factors  we  have  discussed. 
3.    Some  Applications  of  Stochastic  Models 

In  this  section  we  review  several  applications  of  stochastic  models  of 
consumer  behavior.   Our  discussion  will  center  around  the  three  major  model 
building  approaches  —  zero-order,  Markovian,  and  learning  —  and  will  include 
both  theoretical  and  empirical  results.   This  discussion  is  not  intended  to  be 
an  exhaustive  review,  but  rather  will  serve  to  illustrate  the  type  of  work  which 
has  been  done  in  this  area. 


/I 
In  fact  the  combining  of  classes  problem  arises  whenever  an  N-alternative 
market  is  collapsed  into  an  M-alternative  market  where  M  <  N.   The  case  where 
M=2  is  the  one  generally  used  in  marketing  models. 

See  Bush,  Mosteller,  and  Thompson  [1954]  or  Bush  and  Mosteller  [1955]. 
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3. 1  Zero-Order  Models 

Before  turning  to  a  discussion  of  the  work  which  has  been  done  on  zero- 
order  models,  we  must  first  specify  what  is  meant  by  a  zero-order  model.   By 
a  zero-order  model  we  mean  one  in  which  the  response  probabilities  are  not 
affected  or  altered  by  the  particular  history  of  responses  which  have  been 
made.   Consider  an  individual  in  a  situation  in  which  he  may  make  one  of  two 
alternative  responses,  A  or  B,  on  any  given  response  occasion.   Such  a 
situation  might  be  a  market  where  we  let  A  represent  some  brand  of  interest 
and  B  represent  an  aggregate  "all  other"  brand.   We  let  P(A  )  represent  his 
probability  of  making  response  A  at  response  occasion  t.   Then  in  a  zero- 
order  model  we  would  have,  for  example, 

P(A^|a^_^,Bj._2,...,Aq)  =  P(A^|b^_^,B^_2,...,Bq)  =  P(A^). 
In  general,  we  have 

P(A  |{some  history  of  A's  and  B's  at  times  0,  1,  ...,  t-1})  =  P(A  ) , 
whatever  the  particular  prior  history  of  A's  and  B's  may  be. 
3. 11  Early  Work  on  Consumer  Brand  Choice 

Brown.   Using  the  purchasing  records  of  100  families  from  the  Chicago 
Tribune  consumer  panel  for  the  year  1951,  Brown  (1952)  studied  brand  loyalty 
behavior  toward  certain  frequently  purchased  products  such  as  toothpaste, 
margarine,  coffee,  soap,  etc.   His  measure  of  brand  loyalty  depended  upon  the 
number  and  pattern  of  purchases  of  different  brands  during  the  year.   Based 
upon  his  operational  measure  of  brand  loyalty.  Brown  classified  households  as 
having:   undivided  loyalty,  divided  loyalty,  unstable  loyalty,  or  no  loyalty. 
Morrison  (1965b)  has  observed  that  Brown's  measure  of  brand  loyalty  doesn't 
necessarily  satisfy  even  the  weakest  necessary  condition  for  a  meaningful  scale, 

o 
the  property  of  transitivity.    While  hindsight  and  developments  during  the 


For  a  readable  and  useful  treatment  of  measurement  and  scaling  in  marketing, 
see  Green  and  Tull  (1966,  Chapter  7). 
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ensuing  years  may  tend  to  make  one  overcrltical,  Brown's  work  did  reveal  that 
consumers  concentrate  their  purchases  much  more  than  had  been  previously 
expected. 

Cunningham.   The  definition  of  brand  loyalty  was  sharpened  in  a  later 
study  by  Cunningham  (1956).   He  operationally  defined  brand  loyalty  as  the 
proportion  of  total  purchases  within  a  product  class  that  a  household  devotes 
to  its  favorite  or  most  frequently  purchased  brand.   In  a  subsequent  study 
Cunningham  (1961)  defined  store  loyalty  in  an  analogous  manner.   The  research 
hypotheses  in  these  studies  centered  about  the  postulated  existence  of  brand 
or  store  loyalties.   The  null  hypothesis  was  that  brands  would  be  purchased 
and  stores  visited  on  an  equiprobable  basis.   That  is,  the  null  hypothesis 
was  that  no  propensity  to  purchase  particular  brands  or  to  shop  in  particular 
stores  exists.   The  Chicago  Tribune  panel  once  again  served  as  the  data  base 
for  these  studies. 

The  results  of  these  studies  are  summarized  below: 

1.  Significant  brand  loyalty  exists  within  product  classes  (intra- 
class  loyalty) . 

2.  Loyalty  proneness  or  the  propensity  to  be  brand  loyal  across  product 
classes  does  not  exist. 

3.  Store  and  brand  loyalty  are  not  significantly  related. 

4.  Purchases  on  deals  tend  to  be  concentrated  among  households  having 
low  brand  loyalties. 

5.  Consumption  and  brand  loyalty  are  unrelated. 

6.  A  household's  time  in  the  panel  does  not  relate  to  its  loyalty  behavior, 

7.  There  is  more  store  loyalty  generated  toward  chain  stores  than  toward 
specialty  stores  or  independents. 

Cunningham's  results  suggest  that  families  concentrate  their  brand  and 

store  choices  to  a  far  greater  extent  than  that  expected  under  the  equiprobable 
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chance  model.   It  should  be  noted  that  he  implicitly  assumed  that  panel  members 
made  their  brand  choice  decisions  according  to  a  stationary  Bernoulli  process. 
3.12  Bernoulli  Models 

Frank.   Frank  (1962)  analyzed  the  August,  1957  through  September,  1958 
regular  and  instant  coffee  purchases  from  the  Chicago  Tribune  panel  in  an 
effort  to  test  whether  consumer  brand  choice  can  be  described  by  a  zero-order 
process  at  the  household  level.   He  argued  that  some  of  the  "learning"  effects 
which  had  been  found  by  Kuehn  (1958)  may  in  fact  be  spurious  as  a  result  of 
the  aggregation  of  heterogeneous  consumers  whose  true  brand  switching  processes 
are  zero  order.   Recall  our  discussion  of  the  confounding  of  the  effects  of 
heterogeneity  and  non-stationarity  in  Section  2. 

A  sequence  of  twenty  consecutive  purchases  from  each  household  was  tested 

9 
by  the  Wald-Wolfowitz  run  test   to  see  whether  the  sequence  could  have  been 

generated  by  a  stationary  Bernoulli  process.   Recall  that  by  stationary  we 

mean  that  each  household  has  a  constant  probability  of  purchasing,  say  brand  A, 

over  the  twenty  trials.   Frank  found  that  much  of  the  brand  choice  behavior 

was  consistent  with  the  Bernoulli  trial'  hypothesis.   However,  he  did  find  that 

there  were  too  many  families  with  long  runs.   This  result  may  be  due  to  one 

of  the  following  factors: 

1.  The  process  generating  the  observations  is  non-Bernoulli. 

2.  The  process  is  basically  Bernoulli  in  the  short  run,  but  the  Bernoulli 
parameter  is  non-stationary  over  longer  purchase  sequences. 

To  the  extent  that  the  second  factor  is  operating,  we  might  erroneously  reject 

a  true  Bernoulli  process  which  is  non-stationary.   Another  pitfall  in  the 

run  test,  in  addition  to  the  non-stationarity  issue,  is  that  for  sequences  of 


9 
See  Siegel  (1956,  pp.  136-145)  for  a  description  of  the  Wald-Wolfowitz  run 

test. 
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twenty  responses  the  run  test  is  not  particularly  powerful.    Consequently 
the  test  may  accept  as  Bernoulli,  processes  which  are  rather  far  from  Bernoulli 
in  nature.   In  spite  of  these  limitations  in  statistical  procedure,  Frank's 
work  raises  an  important  issue  —  aggregation  —  with  respect  to  inferences 
concerning  the  order  of  consumers'  brand  choice  processes. 

Massy  and  Frank.   Using  the  techniques  of  factor  analysis  and  simulation. 
Massy  and  Frank  (1964)  investigated  the  relationship  between  overt  measures  of 
consumers'  brand  and  store  switching  behavior  and  the  underlying  structure  of 
these  switching  processes.   The  empirical  data  used  in  this  study  were  drawn 
from  the  family  by  family  purchase  records  of  the  J.  Walter  Thompson  consumer 
panel  for  the  period  between  July,  1956,  and  June,  1957.   Three  product 
categories  were  analyzed:   coffee,  tea,  and  beer. 

For  each  product  class  and  within  product  class  by  family  they  developed 
twenty-nine  raw  purchasing  statistics  such  as  number  of  brand  runs,  number  of 
store  runs,  average  length  of  brand  runs,  and  average  length  of  store  runs. 
These  statistics  were  then  factor  analyzed  by  principal  components  in  order  to 
attempt  to  identify  the  basic  dimensions  of  loyalty  and  activity.   From  this 
analysis  the  first  principal  component  and  the  first  four  varimax  rotated 
factors  were  identified  respectively  as: 

1.  General  loyalty  to  stores  and  brands 

2.  Activity 

3.  Brand  loyalty 

4.  Store  loyalty 

5.  Consistency  with  respect  to  second  and  third  favorite  brands 


See  Moses  (1952),  for  some  results  which  relate  to  the  power  of  the  run  test. 
Also  see  Siegel  (1956,  pp.  144-145). 
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They  then  constructed  a  simulated  population  of  consumers  in  which  each 
actual  family  in  the  original  analysis  would  be  represented  by  a  particular 
simulated  family.   The  simulated  families  had  zero-order  brand  and  store 
switching  processes  with  quantity  purchased  determined  by  a  Poisson  distribu- 
tion with  the  minimum  purchase  set  at  one  unit.   The  parameters  of  each 
simulated  family  were  estimated  from  the  purchase  history  of  the  corresponding 
actual  family. 

The  simulated  families  were  then  run  through  two  simulated  purchase 
histories:   one  having  the  same  number  of  purchases  as  the  corresponding  actual 
family  had  made  during  the  year,  the  other  having  three  times  the  actual  number 
of  purchases.   The  same  twenty-nine  raw  purchasing  statistics  were  computed  for 
each  simulated  family  in  these  runs  as  had  been  done  for  the  actual  purchase 
records.   These  statistics  were  then  factor  analyzed  for  each  simulation  run. 
The  factor  profiles  generated  by  the  simulated  zero-order  families  were  then 
compared  with  the  factor  profile  from  the  actual  data. 

Comparison  of  the  actual  factor  profile  with  the  two  simulated  factor 
profiles  led  the  authors  to  conclude  that: 

1.  For  coffee  and  tea,  brand  switching-  behavior  is  not  distinguishable 
from  a  zero-order  process. 

2.  For  beer,  brand  switching  seems  to  be  a  higher  order  process. 

3.  Store  switching  behavior  for  coffee,  tea,  and  beer  appears  to  be 
adequately  described  by  a  zero  order  process. 

While  it  is  not  clear  how  sensitive  the  factor  profiles  might  be  to 

deviations  from  the  Bernoulli  assumption,  the  method  used  in  this  study  is 

interesting  in  that  it  attempts  to  explore  the  manner  in  which  the  underlying 

structure  of  brand  and  store  switching  behavior  affects  common  summary  statistics 

of  brand  and  store  choice  behavior.   Since  the  statistics  are  generated  on  a 

family-by-family  basis,  this  method  avoids  the  aggregation  problem'  . 
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Massy.   Massy  (1966)  examined  the  order  and  homogeneity  of  the  brand 
switching  process  for  specific  families.   The  data  base  for  this  study  was  the 
Chicago  Tribune  panel  purchase  records  for  regular  coffee  between  January,  1956, 
and  February,  1959. 

Since  he  wanted  to  study  the  family  specific  process,  it  was  necessary  to 
screen  the  families  for  purchase  frequencyc   Of  the  original  sample  of  800 
families,  215  families  had  purchased  regular  coffee  sufficiently  often  to  be 
included  in  the  study.   Since  the  inference  methods  used  depend  upon  stationarlty 
of  the  process  generating  the  observed  time  series,  it  was  necessary  to  further 
screen  the  families  for  stationarlty.   Massy  found  out  that  of  the  sample  of 
215  families  that  had  passed  the  purchase  frequency  test  only  39  could  also 
meet  the  stationarlty  requirement. 

For  this  sample  of  39  frequent  and  stationary  purchasers  of  regular  coffee 
he  then  computed  family  specific  transition  matrices  as  well  as  an  aggregate 
transition  matrix  for  all  39  families  combined.   These  transition  matrices  had 
three  states:   favorite  brand,  second  favorite  brand,  all  other  brands. 
Favorite  and  second  favorite  brands  were  defined  by  purchase  frequencies  over 
the  entire  data  period. 

Using  the  appropriate  Anderson  and  Goodman  (1957)  test  on  the  aggregate 
matrix,  he  found  that  the  null  hypothesis  of  a  zero  order  process  could  be 
rejected  in  favor  of  a  higher  order  process  at  the  99%  confidence  level.   Thus 
the  aggregate  switching  matrix  suggests  that  the  process  is  of  higher  order 
than  zero. 

Since  others  (Frank  1962  and  Morrison  1965b)  had  warned  of  the  danger  of 
inferring  the  order  of  a  process  at  the  individual  level  from  results  based 
upon  aggregate  transition  matrices.  Massy  then  sought  to  infer  the  order  of 
the  process  for  each  family  from  its  own  transition  matrix.   Based  upon  this 
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disaggregative  data,  he  found  that  in  only  6  out  of  the  39  cases  would  the  null 

hypothesis  of  a  zero  order  process  be  rejected  in  favor  of  a  higher  order 

process  even  at  the  relatively  loose  90%  confidence  level.   Recognizing  that 

this  doesn't  necessarily  establish  the  validity  of  a  zero  order  switching 

process  for  regular  coffee  consumption,  Massy  notes: 

...  if  we  had  all  the  data  in  the  world  we  would  be  surprised  if  the 
probabilities  of  purchasing  different  brands  were  serially  independent. 
The  real  question  at  issue  is  whether  the  departures  from  a  zero  order 
process  are  consistently  serious  enough  to  warrant  using  the  more 
complicated  first  order  model  to  describe  brand  switching  behavior. 
The  results  suggest  that  if  we  do  not  have  strong  a  priori  views  about 
the  matrix  for  a  given  family,  the  size  of  the  departure  from  a 
condition  of  independent  trials  is  probably  small  relative  to  the  sampling 
errors  that  are  likely  to  be  obtained  when  we  estimate  the  relevant 
parameters. 

In  sum,  Massy 's  results  indicate  that  stationarity  is  the  exception  rather 
than  the  rule,  that  consumers  differ  markedly  in  their  brand  switching 
transition  matrices,  that  inferences  concerning  the  order  of  family  specific 
processes  from  aggregate  transition  matrices  are  extremely  sensitive  to  the 
assumption  of  homogeneity  and  stationarity  and  consequently  are  very  tenuous, 
and  finally,  that  a  zero  order  switching  model  seems  to  suffice  for  regular 
coffee.   This  latter  point  is  consistent  with  Frank's  (1962)  results  for 
regular  coffee  as  well  as  the  previously  discussed  Massy  and  Frank  (1964) 
results. 

Morrison.   Morrison  (1965b)  has  developed  statistical  tests  and  estima- 
tion procedures  for  heterogeneous  populations  of  consumers  whose  brand  choice 
behavior  in  the  short  run  may  be  described  by  Bernoulli  trials.   He  assumes 
that  each  consumer,  say  consumer  i,  has  some  probability  p.  of  purchasing 
brand  A  versus  all  other  brands  on  each  purchase  occasion.   The  postulated 
Bernoulli  process  for  each  individual  is  assumed  to  be  stationary  in  the 

short  run.   That  is,  the  p.  for  each  individual  remains  constant  over  a  few 

1 
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trials.   The  population  is  also  assumed  to  be  heterogeneous,  which  in  this 
case  means  that  p.  is  distributed  across  the  population  of  consumers.   He 
provides  statistical  procedures  for  both  arbitrary  and  beta  distributions 
of  p..   His  Bernoulli  trials  formulation  turns  out  to  be  a  special  case  of 
one  of  his  heterogeneous  Markov  models.   For  a  more  complete  discussion  of 
Morrison's  work  see  his  paper  at  the  end  of  this  section. 

Montgomery.   Montgomery  [1966]  has  developed  a  model  which  can  account 
for  both  heterogeneity  and  non-stationarity  in  a  two-state  zero-order  process. 
The  model  yields  estimates  of  the  distribution  of  response  probability  across 
the  population  of  consumers,  the  expected  equilibrium  share  for  each  alterna- 
tive, the  rate  at  which  the  market  will  approach  its  steady-state  from  any 
disequilibrium  position,  and  the  propensity  for  the  response  probability 
toward  a  given  response  to  increase.   These  measures  as  well  as  the  structural 
characteristics  of  the  model  are  of  interest  in  consumer  product  markets. 

Methods  have  been  developed  for  estimating  and  testing  the  model.   In 
an  initial  empirical  test  the  model  was  found  to  be  an  excellent  fit  to 
M.R.C.A.  National  Consumer  Panel  data  in  the  product  class  of  dentifrice 
just  before  and  just  after  the  American  Dental  Association  endorsed  Crest 
toothpaste  in  August,  1960.   Thus  the  model  was  found  to  be  empirically  viable 
in  both  the  relatively  normal  pre-endorsement  market  and  in  the  unstable  after 
market . 

The  model,  termed  a  probability  diffusion  model,  describes  the  response 
to  response  (say,  purchase  to  purchase)  behavior  of  a  set  of  heterogeneous, 
non-stationary  consumers.   Since  interresponse  time  is  not  explicitly  modeled, 
it  is  necessary  to  use  an  ad  hoc  segmentation  criterion  such  as  average  inter- 
purchase  time  in  applications  where  we  want  to  draw  real  time  inferences  (such 
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as  market  share)  as  opposed  to  structural  inferences.   As  a  probability 
diffusion  model  it  may  be  described  as  a  continuous  time,  continuous  state 
space  stochastic  process. 

Howard.   Ronald  Howard  [1965]  has  proposed  another  heterogeneous,  non- 
stationary  zero  order  model  which  may  be  applied  to  consumer  behavior.   In 
his  model  the  underlying  parameters  of  a  stochastic  process  which  generates 
observable  outcomes  are  themselves  subject  to  change  at  times  determined  by 
yet  another  stochastic  process.   The  heterogeneity  enters  when  these  under- 
lying parameters  are  themselves  distributed  according  to  some  distribution. 
In  essence,  then,  Howard's  model  undergoes  discrete  changes  at  randomly 
determined  intervals.   He  presented  an  example  having  a  Bernoulli  observable 
process,  a  beta  parameter  distribution,  and  a  geometric  distribution  for  the 
time  between  parameter  changes. 
3. 2  Markov  Models 

A  Markov  process  is  a  model  for  describing  a  sequence  of  events  for  which 
the  probability  of  the  next  event  in  the  sequence  is  dependent  only  upon  the 
present  event.    A  common  example  in  consumer  behavior  is  where  we  consider 
the  events  as  brands  of  purchase,  although  one  may  certainly  model  attitudes 
and  other  aspects  of  consumer  behavior  in  this  manner.   For  instance,  see  the 
paper  by  Lipstein. 

Consider  the  following  simple  Markov  model  of  brand  switching.   For 
purposes  of  exposition,  we  take  a  two  brand  market  (brands  A  and  B)  and  define 
the  state  of  a  consumer  in  a  given  time  period  as  his  brand  of  purchase  during 
ftiat  time  period.   Since  some  consumers  may  not  buy  either  brand  during  a  given 
period,  we  define  the  state  of  the  consumer  as  follows: 


Strictly  speaking  this  is  a  first-order  Markov  process,  but  higher  order 
processes  (i.e.,  the  probability  of  the  next  event  depends  upon  several 
preceding  events)  may  be  made  equivalent  to  a  first  order  process  by  proper 
definition  of  the  state  space. 
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State  A  —  brand  A  was  purchased  this  period 
State  B  —  brand  B  was  purchased  this  period 
State  N  —  no  purchase  this  period 

Note  that  we  have  implicitly  assumed  that  no  consumer  will  make  more  than  one 

12 
purchase  during  the  time  period.     In  this  model  we  postulate  that  all 

consumers  may  be  represented  by  the  following  transition  matrix: 

Purchase  at  time  t  +  1 

A         B  N 

P, 


Purchase  at  time  t 


A 
B 

N 


A, A 

'r,a 


N,A 


A,B 

''b.b 


N.B 


A,N 

'b,n 


■N,N 


Note  that  consumers  are  assumed  to  be  homogeneous  with  respect  to  their 
transition  matrices. 

We  also  assume  that  the  P.  .  (i,j  =  A,B,N)  are  independent  of  t.   That 
is,  we  assume  that  the  Markov  process  is  stationary  in  addition  to  being 
homogeneous  between  individuals.   We  note  that 


1  =  .  ,E„  „  P.  . 
3=A,B,N   1,3 


for  i  =  A,B,N 


What  then  are  these  P..'s?  They  are  the  transition  probabilities  which  indicate 
the  probability  that  a  consumer  who  purchases  brand  i  at  time  t  will  purchase 
brand  j  at  time  t  +  1,  where  we  let  the  no  purchase  state  be  a  dummy  brand. 

The  transition  probability  P. .  is  interpreted  to  be  a  measure  of  a  brand's 
retentive  or  holding  power,  while  P..  is  a  measure  of  brand  j's  power  to 
attract  customers  from  brand  i.   From  the  theory  of  Markov  chains  we  may 
derive  the  expected  steady-state  or  equilibrium  distribution  of  consumers  (i.e., 


12 


Multiple  purchases  could  be  handled  in  th^g   simple  model  at  the  cost  of 
a  greatly  expanded  state  space.   For  example,  we  could  have  State  A^  =  one 
purchase  of  brand  A,  State  A„  =  two  purchases  of  brand  A,  ...,  State  A  B 
one  purchase  of  brand  A  and  one  of  B,  ...,  etc. 
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expected  brand  shares).   A  number  of  other  interesting  measures,  such  as  the 

expected  number  of  periods  before  an  individual  will  try  a  particular  brand, 
are  available. 

In  our  subsequent  discussion  of  Markov  model  applications  we  shall  have 

occasion  to  use  this  simple  model  as  a  point  of  departure  for  our  discussion. 

14 
Harary  and  Lipstein.   Harary  and  Lipstein    found  it  useful  to 

disaggregate  the  overall  transition  matrix  into  a  hard  core  matrix  and  a 

switchers  matrix.   This  procedure  eliminates  some  of  the  problems  associated 

with  the  homogeneity  assumption.   The  hard  core  matrix  was  composed  of  all 

those  households  devoting  three-quarters  or  more  of  their  purchases  in  the 

product  class  to  one  particular  brand.   Their  empirical  experience  has  shown 

that  products  seem  to  have  from  50%  to  80%  hard  core  consumers ^ 

Three  major  types  of  dynamic  market  predictions  were  made  using  the 

Markov  chain  analysis.   The  first  was  the  expected  steady-state  or  equilibrium 

market  shares.   The  expected  equilibrium  market  shares  are  the  expected  shares 

the  brands  would  have  if  the  process  of  brand  switching  as  currently 

estimated  by  the  transition  matrix  were  to  be  allowed  to  run  to  equilibrium. 

Thus  these  equilibrium  shares  provide  a  useful  measure  of  the  direction  in 

which  the  market  is  heading.   In  any  real  market  situation  the  equilibrium 

share  is  rarely  reached  because  competitive  activity  tends  to  alter  the 

values  of  the  transition  probabilities  which  constitute  the  transition  matrix. 

The  second  prediction  is  the  average  time  to  trial.   This  yields  information 

as  to  the  average  number  of  periods  which  will  pass  before  a  consumer  will 

try  a  particular  brand  in  this  product  class.   This  is  a  measure  of  the 


13 

For  a  more  complete  discussion  see  the  paper  by  Harary  and  Lipsteia 

14 

See  the  reading  at  the  end  of  this  chapter. 
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attractive  power  of  the  brand.   Finally  they  suggest  using  the  Markov  chain 
model  to  evaluate  the  success  of  a  new  product  introduction.   The  model  is 
used  to  describe  the  evolution  of  brand  shares,  new  triers,  repeat  buying 
rates,  and  the  proportion  of  hard  core  buyers  for  the  new  product. 

Other  Applications.   Maffei  (I960)  was  one  of  the  first  to  suggest  a 
Markov  model  of  consumer  behavior.   His  discussion  includes  a  rather  thorough 
treatment  of  the  mathematics  for  the  two-state  case. 

Herniter  and  Magee  (1961)  also  discussed  customer  behavior  as  a  Markov 

process.   Their  approach  associates  rewards  with  the  various  states  of  the 

consumer.   They  also  discuss  the  transient  and  steady-state  of  the  process 

and  several  applications.   Their  summary  view  of  the  role  of  Markov  chains  is: 

In  summary,  we  believe  treatment  of  population  change  as  a  Markov 
process  provides  a  means  which  is  both  manageable  and  consistent  with 
behavior: 

1.  Optimization  calculations.   We  have  indicated  some  progress  on  the 
choice  a  promotional  policy  for  maximum  long-term  profit.   The 

model  provides  for  direct  and  explicit  recognition  of  the  significance 
of  long-term  population  changes  to  current  decisions. 

2.  Experimental  design.   The  model  clarifies  the  nature  of  experiments 
needed  for  measurement  of  advertising  and  other  promotional  effects. 
It  provides  a  framework  for  interpreting  experimental  results. 

3.  Simulation.   The  model  can  be  used  directly  to  calculate  and  plot 
out  changes  in  revenue,  cost,  and  the  mix  of  the  consumer  popula- 
tion or  the  size  or  critical  states  over  time,  under  alternative 
marketing  strategies. 

4.  Sensitivity  analysis.   The  characterization  of  customer-population 
behavior  as  a  Markov  process  provides  a  structure  for  investigating 
the  influence  on  policiy  choices  of  alternative  values  of  the  return 
vector,  discount  rate,  costs,  and  transition  rates c 

Styan  and  Smith.   Styan  and  Smith  (1964)  used  Markov  chains  to  analyze 
product  switching  behavior  for  a  panel  of  British  housewives.   For  the  twenty- 
six  week  period  between  January  and  June,  1957,  each  housewife's  purchase 


Herniter  and  Magee  (1961,  pp.  121-22). 
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behavior  in  the  laundry  powder  market  was  classified  into  one  of  the  following 
four  mutually  exclusive  and  collectively  exhaustive  categories: 

1.  Bought  detergent  only 

2.  Bought  soap  powder  only 

3.  Bought  both  detergent  and  soap  powder 

4.  Bought  no  laundry  powder  at  all 

These  categories  define  the  "state  space"  of  the  Markov  chain  analysis. 

The  twenty-six  week  period  enabled  them  to  compute  twenty-five  two- 
period  transition  matrices  for  aggregate  switching  behavior o   That  is,  a 
transition  matrix  was  computed  for  times  t-1  versus  t  for  t=l,  2,  ...,  25. 
It  should  be  emphasized  that  for  any  two-period  transition  matrix,  the  data 

for  all  households  was  aggregated. 

2 
Using  the  x   test  developed  by  Anderson  and  Goodman  (1957) ,  they 

tested  the  order  of  the  Markov  chain.   The  null  hypothesis  and  the  alterna- 
tive were: 

H  :   the  data  are  from  a  zero  order  Markov  chain  (i.e.,  a  zero-order 
process) 

H  :   the  data  are  from  a  first  order  chain 

Each  of  the  twenty-five  matrices  was  tested  for  H  versus  H  .   It  was 
found  that  H^  could  be  rejected  in  favor  of  H   at  a  very  high  level  of 
significance.   Thus  the  aggregate  data  behaved  according  to  a  higher  than  zero 
order  Markov  chain.   In  this  case  there  was  not  sufficient  data  to  test  the 
first  order  hypothesis  against  second  and  higher  order  alternatives.   Note, 
however,  that  while  this  aggregate  first  order  behavior  may  be  useful  from  a 
marketing  standpoint,  it  does  not  establish  first  order  Markov  behavior  on  the 
part  of  the  individual  households  who  enter  this  analysis.   Recall  Massy 's 
(1966)  finding  that  aggregation  led  to  highly  significant  inferences  of  a 
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first  order  process  whereas  disaggregation  for  the  same  sample  of  households 
tended  to  support  the  notion  of  zero  order  or  independent  trials  behavior. 
Frank  (1962)  and  Morrison  (1965b)  have  also  warned  of  the  danger  of  inferring 
individual  behavior  based  upon  aggregate  transition  matrices  when  the  models 
used  assume  homogeneity  of  the  individuals  in  the  sample. 

The  stationarity  of  the  transition  matrices  over  the  twenty-six  week 
period  was  also  tested.   In  this  case,  the  null  hypothesis  of  stationarity 
could  not  be  rejected  since  the  significance  level  of  the  test  was  over  24%. 
Hence,  the  aggregate  data  appear  to  be  consistent  with  a  stationary,  first 
order  Markov  chain. 

They  also  discuss  the  use  of  the  limiting  distribution  of  market  share 
for  the  four  alternatives.   It  was  found  that  the  market  shares  for  these 
twenty-six  weeks  did  not  vary  much  from  the  equilibrium  market  shares 
predicted  by  the  transition  matrix  formed  by  aggregating  over  all  twenty- 
five  of  the  two-period  matrices.   Hence  this  market  was  very  close  to  its 
steady-state  condition. 

The  methodological  significance  of  this  paper  rests  upon  the  fact  that 
they  tested  the  assumptions  of  their  aggregate  Markov  model  and  their 
discussion  and  application  was  data  based.   Hopefully,  this  approach  will 
become  more  common. 

Howard.   Howard   (1963)  objected  to  the  use  of  the  somewhat  artificial 
"no  purchase"  state  and  suggested  a  semi-Markov  chain  model  to  overcome  this 
problem.   In  the  semi-Markov  model  the  transition  probabilities  are  conditional 
upon  a  transition  being  made.   That  is,  they  represent  the  state-to-state 
transition  probabilities  which  are  operative  whenever  a  transition  is  made. 


1  f\ 

This  paper  is  reproduced  at  the  end  of  this  chapter. 
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Transitions  occur  stochastically  in  the  semi -Markov  model.   For  example,  in  a 
brand  switching  application  the  time  between  purchases  is  probabilistically 
determined.   The  time  to  the  next  purchase  distribution  may  be  a  function  of 
the  current  state.   Thus  the  average  time  to  the  next  purchase  may  be  longer 
if  the  last  response  was  response  A  than  if  it  was  response  B.   If  state  A 
represents  a  purchase  of  two  pounds  of  the  product  and  state  B  represents  a 
purchase  of  one  pound,  we  see  that  the  semi-Markov  formulation  will  enable 
us  to  account  for  quantity  purchased  and  its  effect  on  inter-purchase  time. 
We  note  that,  in  general,  the  waiting  time  distribution  and  the  Markov  trans- 
ition probabilities  need  not  be  independent. 

Howard  also  considered  certain  interpretation  errors  which  have  sometimes 
been  made  in  using  Markov  models.   In  particular,  he  pointed  out  that 
fluctuations  in  brand  share  will  be  expected  to  continue  into  the  indefinite 
future.   The  steady-state  or  equilibrium  distribution  is  really  the  distribu- 
tion expected  in  the  steady-state,  but  the  process  will  vary  about  this 
distribution  even  in  the  steady-state.   Some  confusion  had  existed  when  others 
had  treated  the  steady-state  distribution  as  a  fixed  distribution  for  the 
process  at  equilibrium. 

Telser.   Telser  [1962]  used  a  variant  of  the  Markov  brand  switching  model 
to  develop  estimates  of  the  price  elasticities  of  branded  goods.   His  model 
might  be  termed  a  variable  Markov  process  in  that  period  to  period  brand 
transitions  are  made  functions  of  the  brand  prices.   Using  Telser 's  notation, 
this  may  be  expressed  in  general  form  (for  the  case  where  we  just  consider  one 
brand  of  interest  and  the  aggregate  of  all  other  brands)  as 


and 
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a   =  f .  .  (P.  ,  P.  )  =  the  conditional  probability  of  repeating  the 

purchase  of_brand  i  during  period  t  as  a  function 

of  P.   and  P.^ 
It      It 


a,   =  f,  .  (P.  ,  P.  V  =  the  conditional  probability  of  purchases  shifting 
k.i    tci   it   it)  1^1  ^ 

to  brand  i  from  all  other_brands  during  period  t 

as  a  function  of  P.   and  P.^ 

It      It 

where 

P.   =  the  price  of  brand  i  during  period  t 

P.   =  the  average  price  of  all  other  brands  durine  period  t 

In  order  to  obtain  estimates  some  further  specifications  of  the  functions 

f  is  necessary.   Telser  choose  the  simplest  specification  by  making  f  a  linear 

function  of  the  sample  period  average  prices.   In  particular,  he  let  p   = 

P   -  P.   be  the  price  variable  and  specified  the  functions  as 
It    It 

a.  .  =  f . .  (p.^)  =  c. .  +  b.p.^ 
ii    11  ^it     11    i^it 

and 

* 
a,  .  =  f,  .  (p.^)  =  c,  .  +  b.p.^ 
ki    ki  "^it     ki    I'^it 

Since  the  magnitude  of  the  probabilities  a.,  and  a,  .  will  varv  inversely  with 

IX         KX 

*  17 

p.  ,  both  b.  and  b.  will  be  negative. 
'^it        11 

He  also  presented  a  method  for  estimating  the  transition  probabilities 

a. .  and  a,  .  from  market  share  data.   The  market  share  for  brand  i  in  period  t 
11      ki 

is  denoted  by  m.  .   Starting  with  the  relation 
•^   It 
n 

m.   =  E   m.    ,  a..       for  i  =  1,  ...,  n 
It     j^  J,t-1  ji 

between  the  transition  probabilities,  lafeged  market  share,  and  current  market 
share  and  using  the  linear  approximations  for  a, .  and  a,  .  given  above,  he 
developed  an  estimating  equation  of  the  form 


This  will  be  the  true  except  when  the  product  has  a  price-quality  or  price- 
snob  appeal  association.   This  is  not  the  case  for  the  products  considered 
by  Telser. 
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(3-5)     m.^  =  a  +  6m.  ^  T  +  Y  p.^  +  y.^ 
It         i»t-l   '   It    It 


where  p.   is  the  residual  and  the  parameters  a,  B,  and  y  ai"e  functions  of 
previously  specified  parameters. 

Telser  used  M.R.C.A.  National  Consumer  Panel  data  for  the  period  from 
April  1964  to  March  1957  in  estimating  (3-5  ) .   Using  monthly  market  share  and 
price  data,  he  estimated  (  3-5)  for  each  of  several  brands  in  the  product 
classes  of  frozen  orange  juice,  instant  coffee,  and  regular  coffee.   The 
multiple  regression  coefficients  for  these  regressions  ranged  from  0.57  to 
0.92  with  most  values  over  0.70.   The  elasticity  estimates  derived  from 
these  fitted  regressions  were  (averaged  by  brands) 

5.7  for  frozen  orange  juice 

5.5   for  instant  coffee 

4.4  for  regular  coffee 

He  also  developed  an  equation  using  a  relative  price  variable  rather  than 

p.   =  P.   -  P.  .   These  results  are  reported  in  Telser  [1962]. 
*^it    It    It 

While  some  questions  have  been  raised  concerning  the  properties  of  the 

18 
estimateg  in  this  approach,    it  has  the  advantage  that  the  model  can  be 

applied  even  when  relatively  little  data  (and  that  in  aggregate  form)  are 

available.   Another  limitation,  however,  is  that  there  must  be  period  to 

period  changes  in  market  share  if  meaningful  estimates  are  to  be  obtained. 

In  any  case,  this  approach  merits  further  research. 

Morrison.   Morrison  (1965b)  has  developed  two  interesting  new  Markov 

models.   In  previous  applications  of  Markov  models  it  was  always  assumed  that 

each  and  every  consumer  had  the  same  brand  switching  matrix.   Even  when  the 

total  switching  matrix  is  disaggregated  into  a  hard  core  and  a  switchers 


18 

Morrison  [1965b]  indicates  that  Nerlove  has  raised  some  questions  along  these 

lines  about  another  paper,  Telser  [1963]. 
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matrix,  previous  applications  have  assumed  identical  switching  matrices  for 
individuals  within  these  disaggregated  segments.   He  generalized  these  previous 
models  to  include  the  case  where  individuals  may  be  heterogeneous  with  respect 
to  their  brand  switching  matrices. 

His  models  yield  interesting  inferences  as  to  the  structure  of  brand 
loyalty  in  a  market.   The  Brand  Loyal  Model  model  considers  loyalty  to  be 
oriented  toward  a  particular  brand,  whereas  the  Last  Purchase  Loyal  Model 
assumes  that  loyalty  is  generated  toward  the  brand  last  purchased.   His 
heterogeneous  Bernoulli  model  discussed  earlier  turns  out  to  be  a  special  case 
of  the  Brand  Loyal  Model. 

He  develops  a  minimum  chi  square  procedure  for  estimating  and  testing 
these  models  with  an  arbitrary  heterogeneity  distribution.   Such  a  procedure 
yields  best  asymptotically  normal  estimates  of  the  parameters  as  well  as  a  chi 
square  statistic  which  measures  the  overall  "goodness  of  fit"  of  the  model  to 

1  Q 

the  data.    It  should  also  be  noted  that  he  has  developed  the  power  of  the 
chi  square  tests  against  specific  alternatives,  thus  facilitating  a  direct 
comparison  of  alternative  models. 

Attention  was  then  focused  on  an  initial  empirical  comparison  of  these 
models.   Using  Chicago  Tribune  panel  coffee  data  from  January  1956  through 
February  1959,  he  fitted  each  of  his  models  to  several  alternative  segmenta- 
tions of  the  market.   He  found  that  the  Brand  Loyal  Model  yielded  the  best  fits 
as  measured  by  the  chi  square  statistics.   Since  each  of  these  models  has  a 
different  number  of  degrees  of  freedom,  one  really  should  compare  their 
respective  chi  square  probability  levels  rather  than  the  actual  values. 


For  a  more  complete  discussion  of  estimating  and  testing  stochastic  models 
see  Massy,  Montgomery,  and  Morrison  [1967,  Chapter  2]. 
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However,  the  difference  in  degrees  of  freedom  is  not  sufficient  to  change  the 
conclusions  in  the  present  case.   A  far  more  serious  problem  in  his  empirical 
results  is  his  use  of  overlapping  response  sequences.   This  procedure,  while 
perhaps  not  invalidating  the  comparison  between  his  models,  nevertheless  is 
likely  to  bias  his  empirical  chi  square  statistics.   Thus  we  must  be  careful 
as  to  what  we  conclude  regarding  the  "goodness  of  fit"  of  the  Brand  Loyal 
Model  in  an  absolute  sense. 

Further,  he  found  that  consumers  were  not  too  far  from  "Bernoulliness"  in 
their  brand  choice  behavior  as  measured  by  k  in  the  Brand  Loyal  Model.   In 

another  portion  of  his  empirical  work  he  found  that  for  the  coffee  data,  time 

20 
between  purchases  does  not  have  a  significant  effect  on  brand  loyalty.    This 

is  in  contrast  to  Kuehn's  results  for  frozen  orange  juice. 

For  details  of  the  models  see  the  paper  at  the  end  of  this  chapter. 

21 
Lipstein.   Lipstein    [1965]  has  developed  a  nonstationary  Markov  model 

relating  advertising  effort  to  attitude  changes  and  consumer  purchases.   He 

presented  a  new  analytic  method  for  handling  non-stationary  stochastic 

matrices.   Certain  aspects  of  the  behavior  of  the  market  system  can  be 

inferred  from  this  method.   For  example,  the  likelihood  that  observed  changes 

will  persist  may  be  inferred.   His  model  would  seem  to  hold  some  promise  as 

a  measurement  tool  in  a  dynamic  market  environment. 

3.3  Linear  Learning  Models 

Kuehn.   Kuehn  (1958  and  1962)  used  a  modified  form  of  Bush  and  Mosteller's 

22 
linear  learning  theory  as  a  model  of  consumer  brand  choice.    The  linear 


20 

The  most  accessible  presentation  of  these  results  is  in  Morrison  [1966b], 

21 

This  article  is  reproduced  at  the  end  of  this  chapter. 
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See  Bush  and  Mosteller  (1955). 
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learning  model  assumes  that  a  purchase  of  a  brand  will  increase  its  probability 

of  being  repurchased.   The  model  is  outlined  below. 

Suppose  we  have  a  two  brand  market.   One  of  these  two  brands  might  well 

be  a  brand  of  particular  interest  to  us  —  brand  A  —  while  the  other  brand 

may  simply  represent  all  other  brands.   In  the  two  brand  market  it  is  sufficient 

to  consider  P(A  )  =  P  ,  the  probability  of  purchasing  brand  A  at  time  t,  where 

time  indexes  purchase  events.   As  we  mentioned  above,  the  linear  learning 

model  assumes  that  the  actual  response  made  at  time  t  affects  the  probability 

of  purchase  at  time  t  +  1.   In  particular,  the  following  pair  of  linear 

equations  summarize  this  assumption. 

P  ,,  =  a,  +  3tP  if  brand  A  was  purchased  at  time  t 
t+1    lit 

P  , ,  =  a  +  B  P  if  some  other  brand  was  purchased  at  time  t 
t+1    o    o  t 

Note  that  while  P  ^    depends  upon  P  and  the  response  made  at  time  t,  P 
itself  summarizes  the  influence  of  all  past  purchases.   If  we  consider  the 
probability  of  purchasing  brand  A  to  be  the  state  of  the  model,  the  system  is 
first  order  Markov.   If  the  actual  purchase  decisions  (purchase  of  brand  A 
or  some  other  brand)  are  considered  to  be  the  states  of  the  system,  the  model 
is  of  infinite  order  on  its  state  space.   The  linear  equations  for  P^^.-i  ^'^e 
depicted  in  Figure  3-1 

Suppose  we  take  a  consumer  who  has  probability  P  of  purchasing  brand  A 
at  time  zero.   That  is,  we  are  at  P  in  Figure  3-1  Now  suppose  further  that 
he  purchases  brand  A  at  time  zero.   Since  an  actual  purchase  of  brand  A  is 
assumed  to  enhance  his  probability  of  purchasing  brand  A  on  the  next  trial,  we 

find  the  probability  of  his  purchasing  brand  A  at  time  one  by  reading  from  P 

P 
up  to  the  purchase  operator  and  then  over  to  P^ .   The  superscript  denotes  a 

purchase  of  brand  A  in  the  previous  trial.   Similarly,  if  he  purchased  some 

Other  brand  at  time  zero,  his  probability  at  time  one  would  be  P,  as  read  from 


t+1 
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Figure  3-1 
KUEHN'S  LINEAR  LEARNING  MODEL 


Purchasp:  Operator 
Slope  =  6 


Rejection  Operator 
Slope  =  e 


^A   1 


the  rejection  operator.   Note  that  the  purchase  of  some  other  brand  at  time  t 

will  in  general  decrease  the  consumer's  probability  of  purchasing  brand  A  at 

time  t+1. 

In  most  applications  of  the  linear  learning  model  it  is  also  assumed  that 

6=6  =6,  —  i.e.,  that  the  slopes  of  the  purchase  and  rejection  operators 
o    1 

are  equal.   In  this  case,  the  influence  of  past  purchases  on  the  present 

purchase  probability  are  geometrically  weighted  with  the  most  recent  purchase 

23 
having  the  greatest  weight. 

In  Figure  1  we  note  that  P  and  P  ^  are  bounded  by  L  and  U  .   That  is. 


23, 


Note  that  geometric  weightings  are  the  discrete  analogue  of  exponential' 
weightings. 
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and 


^A  ^  ^  -^  "a 


^A  ^  ^+1  ^  \ 


This  implies  that  learning  never  goes  to  completion.   That  is,  the  consumer  is 
never  absolutely  certain  to  purchase  one  brand  or  the  other. 

Kuehn  has  not  directly  tested  or  estimated  the  linear  learning  model 
described  above  in  any  of  his  published  work.   In  his  thesis  (Kuehn,  1958) 
he  used  factorial  analysis  in  an  attempt  to  isolate  the  effects  of  past 
purchases.   His  results  for  past  histories  of  four  successive  purchases  are 
given  in  Table  3-L  The  data  are  purchases  of  frozen  orange  juice  where  brand  A 
is  Snow  Crop  and  brand  0  is  all  others. 

Table  3-1 
KUEHN 'S  LEARNING  MODEL  RESULTS 


Brand  Choice  on  Four 
Prior  Purchases 


Probability  of  Purchasing 
Brand  A  on  the  Next  Trial 


AAAO 

0.486 

AAOA 

0.595 

AOAA 

0.665 

OAAA 

0.690 

AAOO 

0.305 

AOAO 

0.405 

OAAO 

0.414 

AOOA 

0.565 

OAOA 

0.497 

OOAA 

0.552 

AOOO 

0.154 

OAOO 

0.129 

OOAO 

0.191 

OOOA 

0.330 
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The  results  reported  in  Table  l  illustrate  the  increasing  effect  of  the 
most  current  purchases.   These  results  and  certain  others  do  appear  to  be 
consistent  with  the  linear  learning  model  of  consumer  brand  choice.   But  it 
must  be  borne  in  mind  that: 

A.  These  results  assumed  that  all  consumers  had  the  same  P  ,  thus  there 
will  be  a  confounding  of  heterogeneity  with  the  learning  effect,  as 
pointed  out  by  Frank  (1962). 

B.  The  model  has  not  be  directly  tested. 

C.  A  model  which  incorporates  both  heterogeneity  and  non-stationarity 
may  provide  a  better  model  of  the  process. 

The  models  of  Howard  [1965],  Montgomery  [1966],  and  Morrison  [1965b]  are 
worth  considering  under  item  C.   Papers  dealing  with  the  latter  two  approaches 
follow  this  chapter. 

Kuehn  also  found  an  exponential  decay  in  repurchase  probability  as  the 
time  between  purchases  increases.   This  result  was  not  corroborated  by  the 
work  of  Carman  (1966)  and  Morrison  (1966  ).   This  may,  of  course,  be  due  to 
the  fact  that  Kuehn  used  frozen  orange  juice  data  while  Carman  and  Morrison 
used  dentifrice  and  coffee,  respectively. 

In  closing  this  discussion  of  Kuehn' s  work,  it  should  be  noted  that  he 
was  one  of  the  first  researchers  to  investigate  the  application  of  learning 
models  to  consumer  behavior.   His  work  has  served  as  a  stimulus  to  more  general 
approaches  in  stochastic  modeling  of  consumer  behavior. 

Haines .   Haines  (1964)  used  a  modified  form  of  the  linear  learning  model 
as  a  model  of  market  behavior  after  innovation.   The  modification  involved 
the  rejection  operator.   Since  the  model  is  to  be  of  a  market  after  innovation, 
it  was  felt  that  a  no  purchase  or  rejection  trial  should  not  affect  the 
probability  of  a  purchase  on  future  trials.   That  is,  a  no  purchase  trial  with 
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respect  to  the  innovation  does  not  alter  the  probability  of  a  purchase  on 
future  trials. 

For  time  series  data  from  thirty-four  geographic  areas,  Haines  developed 
aggregate  market  measures  in  terms  of  his  modified  learning  model.   The 
measures  were  asymptotic  market  share  for  the  new  product  and  the  rate  at 
which  the  market  was  approaching  its  asymptotic  value.   The  model  generally 
was  a  good  fit  to  the  data  with  no  significance  level  greater  than  0.10. 

Haines  then  attempted  to  estimate  the  rate  of  approach  to  equilibrium  or 
rate  of  approach  to  asymptote  and  the  asymptotic  market  share  as  a  function 
of  various  marketing  policies.   The  rate  of  approach  to  equilibrium  was  found 
to  relate  to  the  amount  spent  on  advertising  during  the  first  two  months  and 
to  the  prior  availability  of  the  good.   The  equilibrium  market  share,  adjusted 
for  the  population  in  each  region,  related  to  the  per  capita  promotional 
expenditures.   A  crucial  assumption  in  Haines'  model  formulation  is  that  there 
be  no  ready  substitutes  for  the  innovation.   He  examines  the  potential  bias 
which  can  exist  when  this  assumption  is  violated. 

Haines'  basic  approach  is  a  sound  one.   He  used  a  stochastic  model  of  the 
dynamics  of  consumer  choice  behavior  to  estimate  behavior  within  selected 
geographic  segments.   His  model  provides  summary  estimates  of  the  behavioral 
dynamics  within  each  segment  as  well  as  measures  of  "goodness  of  fit"  of  the 
model  to  the  data.   He  then  uses  these  interpretable  model  parameters  as  the 
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While  this  assumption  appears  to  be  reasonable  prior  to  the  first  purchase 

of  a  consumer  non-durable  innovation,  it  seems  less  reasonable  once  the 

innovation  has,  in  fact,  been  tried.   Perhaps  the  rejection  operator  should 

be  divided  into  two  conditional  operators.   Haines'  operator  would  be  applied 

to  no  purchase  trials  prior  to  the  first  trial,  while  the  standard  form  of 

the  rejection  operator  would  be  applied  once  an  initial  purchase  has  been 

made.   It  should  be  noted  that  this  more  realistic  approach  will  complicate 

the  mathematics  of  the  process,  perhaps  even  rendering  the  model  intractable. 
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dependent  variables  in  regressions  on  market  decision  variables.   This  strategy 
should  receive  increasing  attention  in  attempts  to  model  markets  and  to  measure 
the  impact  of  marketing  policy  variables.   For  details  of  his  model  see  his 
paper  at  the  end  of  this  chapter. 

Carman .   Carman  (1966)  notes  that  Kuehn  has  published  little  empirical 
evidence  in  support  of  the  linear  learning  model,  and  further,  that  the  results 
which  have  been  published  do  not  directly  test  the  linear  learning  hypothesis. 
Using  M.R.C.A.  panel  data  for  dentifrice  purchases  in  the  period  immediately 
subsequent  to  the  American  Dental  Association's  endorsement  of  Crest  in 
August,  1960,  Carman  addressed  himself  to  the  following  problems: 

1.  A  direct  test  of  the  linear  learning  model  in  a  product  class  for 
which  empirical  evidence  has  not  been  published  (i.e.,  dentifrice). 

2.  A  test  of  Kuehn' s  finding  of  a  decay  in  repurchase  probability  as 
interpurchase  time  increases. 

3.  A  test  of  Frank's  hypothesis  of  spurious  learning  due  to  aggregation 
of  dissimilar  consumers. 

Carman  used  the  special  case  of  the  linear  learning  model  for  which  the 
"purchase  operator"  and  the  "rejection  operator"  have  the  same  slope.   As 
empirical  observations  for  his  estimation  procedure,  he  developed  weighted 
empirical  frequencies  of  the  probability  triplet  (P  ,  P   t+l'^r  +-+1^^^^^^* 

P      =  probability  of  purchasing  Crest  at  time  t 

P   ^  =  probability  of  purchasing  Crest  at  time  t+1  given  that  a 
^'      purchase  of  Crest  was  made  at  time  t 

P      =  probability  of  purchasing  Crest  at  time  t+1  given  that  a 
'      purchase  of  some  other  brand  was  made  at  time  t 

These  empirical  frequencies  were  used  to  estimate  the  parameters  of  the 

purchase  and  rejection  operators  by  least  squares  regressions.   Carman  derived 

his  own  set  of  normal  equations,  but  he  could  have  taken  advantage  of  the  fact 

approach 
that  his  equal  slope  constraint  makes  a  dummy  variable/ feasible.   His  method  also 
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assumes  homogeneity  among  consumers  in  terms  of  their  initial  probability  of 
purchasing  Crest.   That  is ,  P   is  assumed  the  same  for  all  consumers  in  one  application 
of  the  model. 

Within  the  limitations  of  his  operational  definitions  and  methods,  the 

dentifrice  data  do  not  appear  to  be  inconsistent  with  a  linear  learning  model. 

25 
The  coefficients  of  determination  ranged  from  0.67  to  0.99. 

In  order  to  test  Kuehn's  time  decay  of  repurchase  probability  hypothesis, 
he  segmented  the  households  according  to  their  average  interpurchase  time  in 
the  dentifrice  market.   The  predicted  decay  in  repurchase  probability  as  inter- 
purchase time  increases  did  not  occur  in  the  data.   This  finding  of  the 
insignificance  of  the  time  decay  is  consistent  with  results  obtained  by 
Morrison  (1965b)  for  regular  coffee. 

To  test  the  Frank  hypothesis.  Carman  identified  a  group  of  "switchers" 
defined  by  their  brand  purchase  sequence  on  the  three  purchases  prior  to  the 
period  under  study.   For  this  group  of  switchers,  he  concludes  that  the  apparent 
learning  could  not  all  have  been  caused  by  over-aggregation.   He  bases  this 
conclusion  upon  results  of  a  regression  and  a  scatter  diagram  presented  as 
Figure  3,  p.  30  of  Carman  (1966).   (From  the  scatter  of  points  in  the  diagram 
it  is  difficult  to  see  how  a  coefficient  of  determination  of  0.91  was  achieved.) 

Carman  also  considered  the  projected  equilibrium  brand  share  for  Crest. 
For  the  period  immediately  following  the  endorsement,  the  brand  share  projections 
were  approximately  72%  for  all  subgroups.   For  later  periods  these  projections 
dropped  to  more  reasonable  levels.   This  suggests  that  the  learning  process 
itself  —  if  it  exists  —  is  non-stationary.   That  is,  a^^,  a^,    and  3  change  over 
time,  presumably  under  the  impact  of  competitive  reaction  to  the  Crest  success. 


2  5 

The  coefficient  of  determination  is  the  square  of  the  multiple  regression 

coefficient  and  represents  the  proportion  of  the  total  variance  in  the 

dependent  variable  which  is  accounted  for  by  the  linear  relation  to  the 

independent  variables. 
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Massy.   In  the  above  applications  of  the  linear  learning  model  it  is 
assumed  that  all  respondents  start  with  the  same  probability  of  making  response 
1  versus  response  0.   Carman,  of  course,  disaggregated  into  somewhat  more 
homogeneous  groups  here.   In  any  case  this  appears  to  be  an  artificial 
assumption.   Massy  (1965)  has  developed  procedures  for  estimating  the  linear 
learning  model  when  the  initial  response  probability  is  distributed  across  the 
population  of  respondents.   He  presents  a  minimum  chi  square  as  well  as  an 
approximate  regression  procedure.   The  former  is  computationally  burdensome, 
yet  it  yields  a  single  overall  measure  of  the  linear  learning  hypothesis  while 
at  the  same  time  providing  best  asymptotically  normal  estimates  of  the 
parameters.   In  this  case  the  asymptotic  results  are  achieved  as  the  number 
of  respondents  or  households  entering  the  analysis  is  increased.   The  approxi- 
mate least  squares  procedure  leads  to  difficulties  in  the  interpretation  of 
the  properties  of  the  estimates.   This  paper  makes  a  substantial  contribution 
on  two  fronts:   (1)  its  content  and  results,  and  (2)  its  public  presentation 
of  estimation  procedures  for  the  linear  learning  model. 

Demetz.   A  somewhat  different  use  of  consumer  "learning"  is  discussed  by 
Demetz  (1962).   Using  an  econometric  model  of  the  entire  frozen  orange  juice 
industry  and  Chicago  Tribune  panel  data  from  1950  through  1957,  he  investigated 
the  following  propositions: 

1.  Is  there  a  learning  process  operating  in  the  market  which  depends 
solely  on  the  age  of  that  market? 

2.  Is  the  learning  process  a  function  of  personal  product  experience? 
His  interest  centered  about  the  question  of  whether  consumers  learn  to 

ignore  certain  artificial  distinctions  between  brands.   In  the  frozen  orange 


This  has  been  remarkable  in  its  absence  in  the  past. 
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juice  market  nationally  advertised  brands  generally  command  a  higher  price 
than  private  labels  even  though  the  private  label  brands  are  often  merely  a 
manufacturer's  nationally  advertised  brand  with  a  private  label  attached. 
Thus  the  question  is  whether  consumers  learn  to  ignore  the  artificial  product 
distinction  generated  by  the  advertised  brand  name.   Frozen  orange  juice  became 
widely  distributed  in  the  United  States  during  1948-49,  just  prior  to  the  data 
period  used  in  this  study. 

Demetz  used  the  following  model  to  answer  question  1: 

where  the  subscript  1  refers  to  nationally  advertised  (high  priced)  brands  and 

the  subscript  2  refers  to  private  label  (low  priced)  brands.   Other  notation  is 

defined  as: 

P  =  the  average  price  per  ounce 

q  =  the  number  of  ounces  sold 

t  =  the  number  of  months  since  January  1950 

The  remaining  items  (A,  a,  6,  y)  ^^^   parameters  of  the  functional  relationship 
which  are  to  be  estimated  from  the  data.   Demetz  used  "P„"  as  a  proxy  variable 
to  measure  the  absolute  price  level.   His  justification  rests  on  the  tendency 
for  the  average  prices  of  national  and  private  label  brands  to  fluctuate 
together  (ie. ,  p   and  p„  tend  to  increase  or  decrease  together). 

It  remains  to  consider  what  interpretation  may  be  given  to  the  various 
parameter  estimates  which  may  result.   Since  the  parameter  "A"  is  merely  a 
scale  factor,  we  may  ignore  it  in  this  discussion  of  the  structural  implica- 
tions of  the  parameter  estimates.   The  remaining  three  parameters  may  be 
positive,  zero,  or  negative.   Each  of  these  is  considered  in  turn.   We  first 
consider  y*  the  exponent  of  t  in  the  model.   If  y   is  positive,  this  means  that 
as  the  time  since  the  introduction  of  frozen  orange  juice  increases,  the  share 
of  the  market  going  to  the  higher  priced  national  brands  increases.   Thus  it 
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would  seem  that  promotion  had  led  to  significant  market  discrimination  between 
similar  products.   Conversely,  if  y  is  negative  we  may  infer  that  as  time 
increases  consumers  tend  to  learn  that  the  private  labels  are  equally  good. 
A  zero  value  of  y  indicates  that  there  is  no  time  trend  in  the  market  in 
terms  of  the  relative  share  between  national  and  private  label  brands.   That 
is,  question  1  may  be  answered  in  the  negative.   For  the  parameter  a,  the 
exponent  of  the  ratio  of  average  prices  of  national  brands  to  private  label 
brands,  a  positive  value  would  indicate  that  consumers  tend  to  purchase 
relatively  more  national  brands  as  their  price  increases  relative  to  private 
labels.   While  such  a  price  quality  association  may  be  reasonable  in  certain 
product  markets,  it  would  be  counter-intuitive  in  the  product  class  under 
study.   A  negative  value  of  a  would  indicate  that  the  relative  market  share 
for  national  brands  decreases  as  their  relative  price  increases.   A  zero 
value  would  indicate  either  that  relative  price  doesn't  matter  much  in  this 
market  or  that  there  was  very  little  variation  in  the  P-i/P-j  ratio.   Finally, 
a  positive  value  of  6  would  indicate  that  the  market  shifts  to  private  labels 
as  the  general  level  of  prices  decreases.   Conversely,  a  negative  value  for 
6  indicates  that  as  the  general  price  level  for  frozen  orange  juice  decreases 
there  is  a  shift  toward  the  relatively  expensive  national  brands.   A  zero 
value  would  indicate  that  the  absolute  price  level,  as  measured  by  the  proxy 
variable  P„,  does  not  affect  the  relative  market  shares  for  these  classes  of 
brands. 

Demetz  found  that  both  a  and  y   were  significantly  negative.   That  is, 
question  1  is  answered  in  the  affirmative  with  the  additional  inference  that 
consumers  "learn"  about  non-essential  product  differences  and  that  relative 
price  does  cause  shifts  between  these  two  classes  of  brands.   The  parameter  6 
was  also  found  to  be  negative,  but  wasn't  significant. 
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In  order  to  explore  question  2,  Demetz  correlated  the  share  of  purchases 
a  given  set  of  families  devoted  to  advertised  brands  with  the  number  of  years 
the  families  in  the  set  have  been  purchasing  frozen  orange  juice.   He  found  a 
significant  negative  correlation  between  the  purchases  devoted  to  national 
brands  and  length  of  product  market  experience.   Thus  he  answers  question  2 
in  the  affirmative. 

His  conclusion  is  that  consumers  are  not  "puppets",  but  that  they  learn 
to  detect  trivial  brand  differences.   He  believes  that  these  results  may  be 
generalized  to  any  low  cost,  frequently  purchased,  relatively  simple  items. 
4    Summary 

In  this  discussion  we  have  considered  a  number  of  factors  relating  to 
stochastic  models  of  consumer  behavior.   We  first  examined  certain  properties 
which  we  would  like  our  models  to  have.   Secondly,  we  discussed  problems  in 
the  stochastic  model  approach.   Finally,  we  considered  several  applications  of 
stochastic  models.   The  principal  contribution  of  these  models  has  been  in 
their  structural  implications  and  their  capacity  for  generating  interesting 
summary  measures  of  market  behavior. 
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